ENT MERFISH Report

1. Overview

1.1 Sample Information

A brief sample information is generated from the submission table for the following analysis.

Sample Index and Basic Information
Expt Sample Index Case_Number Sex Age HPV Sample_Region Genotype Group Region DataPath Clinic.comment
1 NT28530x2 NT28530 1 M 52 Negative Left posterior BOT NT Normal region_NT28530x2 Y:_Imaging_data_2\202409271426_20240927ENT28519ImmunOnc500VA067x01_VMSC00101_234 Left posterior and BOT SCC. 15x years ago treated with chemorad for left tonsil K.
1 TT28519x1 TT28519 1 M 52 Negative Left posterior BOT TT Tumor region_TT28519x1 Y:_Imaging_data_2\202409271426_20240927ENT28519ImmunOnc500VA067x01_VMSC00101_234 Left posterior and BOT SCC. 15x years ago treated with chemorad for left tonsil K.
1 TT28519x2 TT28519_dup 1 M 52 Negative Left posterior BOT TT Tumor region_TT28519x2 Y:_Imaging_data_2\202409271426_20240927ENT28519ImmunOnc500VA067x01_VMSC00101_234 Left posterior and BOT SCC. 15x years ago treated with chemorad for left tonsil K.
1 NT28530x1 NT28530_dup 1 M 52 Negative Left posterior BOT NT Normal region_NT28530x1 Y:_Imaging_data_2\202409271426_20240927ENT28519ImmunOnc500VA067x01_VMSC00101_234 Left posterior and BOT SCC. 15x years ago treated with chemorad for left tonsil K.

1.2 MERSCOPE Data Quality Summary

The summaries present the data quality assessment automatically generated by MERSCOPE for each experiment. We mainly focus on the transcripts level for each sample. So we’re looking for high density in transcripts, based on the transcripts count per field of view (FOV), transcript density in FOV, and frequency of transcripts detected.

Generally, log10 transcript count > 4.0 in most area can be considered as a good quality standard for human tissue.

Need to note that the low accuracy in DAPI cell boundary is not a concern, as a self-designed cell segmentation processing will take over this task.

1.2.1 NT28530x2(Normal)

1.2.2 TT28519x1(Tumor)

1.2.3 TT28519x2(Tumor)

1.2.4 NT28530x1(Normal)

1.3 Transcript Mis-Match on FOV Boundary

Due to issues with the Vizgen software, transcript alignment is incorrect during decoding, leading to misalignment problems at the field of view (FOV) boundary.

Vizgen has solved this issue by temporarily updating the decoding software to an unpublished beta version.

However, this update also made the “vzg” file being upgraded to “vzg2”. Current Vizgen Visualizer cannot recognize the new data format, which made Visualizer unusable for this dataset.

1.4 Autofluorescent issue

Autofluorescence is a major problem limiting the sensitivity of the detection of the fluorescence specifically derived from the applied dye or probe. As we can find it on DAPI imaging, there are many over-bright region.

Autofluorescence may occur due to the incomplete clearance of the tissue. Because of the presence of autofluorescence, the decoding would be confounded and the results in these cells would be artifacts.

As Dr. Tan highly recommends, we removed the portion of cell with autofluorescent issue, for better accuracy.

NT28530_DAPI_Imaging

NT28530_DAPI_Imaging

NT28530_DAPI_Imaging_small

NT28530_DAPI_Imaging_small

NT28530_dup_DAPI_Imaging

NT28530_dup_DAPI_Imaging

NT28530_dup_DAPI_Imaging_small

NT28530_dup_DAPI_Imaging_small

2. Data Processing & Analysis

2.1 Cell Segmentation & Filtering

Based on the spatial information and images obtained from MERFISH, we developed a machine learning model using the Cellpose algorithm to distinguish individual cells via MERFISH DAPI images.

To ensure the data quality and accuracy of cells, we have defined the minimum and maximum values for cell volume and gene count per cell. The cell volume should be between [100, 2000], and the gene count per cell > 25.

After that, we successfully obtained high-quality and accurate cell and gene data. We are now ready to perform statistical analysis.

For analysis platform, all of the analysis and visualization are performed using Scanpy, a commonly used Python package toolkit for analyzing single-cell gene expression data.

2.1.2 Transcript Count Violin

The transcript level is a basic aspect of data quality control. After filtering, we have ensured that the minimum transcript count per cell > 25.

Transcript Count Violin After Filtering

Transcript Count Violin After Filtering

2.2 Batch Effect & Dimension Reduction

We conducted Leiden UMAP clustering, a non-linear, manifold-aware dimension reduction algorithm, on our dataset to subset the cells into clusters. The following annotations are based on the umap clustering.

Umap of cells and colored by batch

Umap of cells and colored by batch

3.Cell Annotation

To annotate individual cell types, we found the marker genes from database CellMarker.

We utilize FCRL5 and MZB1 for the identification of B cells, CCND1 and NOTCH1 for cancer cells, CD83 for dendritic cells, PLVAP and PDK4 for endothelial cells, and COL1A1 for fibroblasts. For macrophages, we apply CD14 in conjunction with CD163, while KIT is employed for mast cells, and TRAC, GATA3, and CD5 are used for T cells.

It is good to note that, so far what we have is a preliminary annotation, and we can potentially achieve further resolution on some cell types. For instance, T cells can be subdivided into regulatory T cells, helper T cells, natural killer T cells, and so on. However, the current sample size is insufficient for these detailed classifications. Once we have enough samples, further annotation should be feasible.

3.1 Markers for type annotation

3.2 Cell Type Umap

3.3 Cell Type Spatial Map

With the annotations, we can map cell types back to their physical positions and create a spatial map for each sample.

3.4 Cancer & T cell & B cell Spatial Map

3.5 Cell Type Qualification Analysis

Cell Type Count
cell_type_2 NT28530 NT28530_dup TT28519 TT28519_dup Total
B cell 2 3 1230 1031 2266
Cancer cell 34 30 465 362 891
Dendritic 2 0 133 110 245
Endothelial 1609 1427 1722 1341 6099
Fibroblast 2059 1732 19739 18582 42112
Macrophage 308 270 1915 1732 4225
Mast cell 35 33 474 486 1028
Smooth muscle cell 374 285 1644 1367 3670
T cell 115 82 811 676 1684
Total 4538 3862 28133 25687 62220

3.6 Cell Type Proportion Analysis

Based on the quantification of individual cell types, we are capable to compare the differences in cell numbers between various samples and genotypes. The proportion of each cell type is calculated by dividing the count of that specific cell type by the total cell count. The results are then visualized in a bar plot.

Notably, B cells, mast cells, and T cells show an obvious increase in tumor samples.

4. Gene differentiation

Here, we use Wilcoxon rank-sum test to compute gene differential expression (DE). P value are adjusted using Benjamini–Hochberg procedure. The statistical significance was cut-off by log2(Fold Change) > 2 or log2(Fold Change) < -2 and p_value < 0.05.

The result are visualized via Volcano plot: a type of scatterplot that shows statistical significance (P value) versus magnitude of change (fold change).

The comparison is between Tumor and Normal samples. A positive fold change means higher expression in the tumor group.

4.1 all

names scores logfoldchanges pvals pvals_adj
COL1A1 128.719830 85.216440 0.0000000 0.0000000
FN1 81.286490 7.571341 0.0000000 0.0000000
FOS -46.969980 -6.759444 0.0000000 0.0000000
PDK4 -60.815765 -6.633553 0.0000000 0.0000000
COL5A1 80.329340 6.229767 0.0000000 0.0000000
EGR1 -34.539925 -5.055663 0.0000000 0.0000000
VEGFA -39.941800 -5.038070 0.0000000 0.0000000
CD79A 5.999517 4.959108 0.0000000 0.0000000
JUN -66.503334 -4.828621 0.0000000 0.0000000
GPX3 -34.578636 -3.987802 0.0000000 0.0000000
MZB1 6.797644 3.979829 0.0000000 0.0000000
SERPINA1 11.236790 3.931262 0.0000000 0.0000000
FCRL5 4.486988 3.772412 0.0000072 0.0000276
CX3CL1 -8.829576 -3.658773 0.0000000 0.0000000
POU2AF1 17.934225 3.616645 0.0000000 0.0000000
FAP 6.151611 3.492644 0.0000000 0.0000000
TNC 32.630886 3.387915 0.0000000 0.0000000
DES -31.869339 -3.284775 0.0000000 0.0000000
TNF 9.065383 3.131130 0.0000000 0.0000000
MYH11 -5.373700 -3.080815 0.0000001 0.0000004
DUSP1 -52.305590 -3.073642 0.0000000 0.0000000
XBP1 6.826031 3.048380 0.0000000 0.0000000
MMP9 11.337405 3.016693 0.0000000 0.0000000
RORC -14.309868 -2.970993 0.0000000 0.0000000
MMP11 14.401033 2.840110 0.0000000 0.0000000
BMP1 54.718426 2.823937 0.0000000 0.0000000
COL11A1 25.157415 2.791966 0.0000000 0.0000000
CCR2 5.268528 2.779894 0.0000001 0.0000006
CR2 4.689729 2.658455 0.0000027 0.0000109
TP63 -4.827339 -2.616780 0.0000014 0.0000057
CXCL1 6.437659 2.539849 0.0000000 0.0000000
PPARGC1A -4.273971 -2.511725 0.0000192 0.0000717
JUNB -35.787884 -2.490444 0.0000000 0.0000000
DERL3 3.310865 2.451468 0.0009301 0.0028356
ITGAX 5.390393 2.434840 0.0000001 0.0000003
LGR6 -2.549464 -2.431690 0.0107889 0.0275226
LGR5 -5.406462 -2.327275 0.0000001 0.0000003
MUC1 9.941775 2.272612 0.0000000 0.0000000
HGF 4.867999 2.261392 0.0000011 0.0000047
CD19 2.838978 2.259758 0.0045258 0.0125717
IRF4 3.012619 2.251352 0.0025900 0.0075292
FLI1 5.158706 2.239080 0.0000002 0.0000011
NCAM1 -16.123442 -2.227492 0.0000000 0.0000000
CLCA1 3.899802 2.198663 0.0000963 0.0003366
PIK3CG 6.211835 2.155691 0.0000000 0.0000000
CD248 30.015327 2.120976 0.0000000 0.0000000
ESCO2 2.790225 2.114788 0.0052671 0.0143129
TNFRSF13C 2.781648 2.100781 0.0054084 0.0145386
CD27 2.849136 2.088706 0.0043838 0.0122453
LOX 4.607403 2.082753 0.0000041 0.0000161
CHEK2 7.125153 2.082282 0.0000000 0.0000000
TNFRSF9 3.862387 2.075009 0.0001123 0.0003845
SPRY2 -21.701843 -2.025791 0.0000000 0.0000000
SH2D1B -3.089103 -2.014404 0.0020076 0.0058702
IL6R -19.968603 -2.006788 0.0000000 0.0000000

4.2 Cancer cell

names scores logfoldchanges pvals pvals_adj
FOS -9.674765 -18.332428 0.0000000 0.0000000
COL1A1 10.273704 15.094367 0.0000000 0.0000000
ATF3 -6.680579 -7.325389 0.0000000 0.0000000
JUNB -6.519753 -6.353756 0.0000000 0.0000000
EGR1 -6.436063 -4.904849 0.0000000 0.0000000
JUN -6.486479 -4.801098 0.0000000 0.0000000
DUSP1 -5.359185 -4.615851 0.0000001 0.0000052
FLT4 3.672781 4.157596 0.0002399 0.0074977
COL5A1 3.294915 3.599521 0.0009845 0.0259082
LMNA -3.604216 -3.569431 0.0003131 0.0092087
FN1 4.274744 3.321740 0.0000191 0.0007973
COL4A1 5.705288 3.206742 0.0000000 0.0000008
THBD -4.691177 -2.850006 0.0000027 0.0001235
PROX1 3.904441 2.778789 0.0000944 0.0036324
MYC -4.748903 -2.713153 0.0000020 0.0001023
EPHB4 5.087696 2.504998 0.0000004 0.0000201
ETS1 3.751177 2.027703 0.0001760 0.0058669
PKM 3.832851 2.000154 0.0001267 0.0045238

4.3 Dendritic

names scores logfoldchanges pvals pvals_adj
NA NA NA NA NA
:—–: :——: :————–: :—–: :———:

4.4 Endothelial

names scores logfoldchanges pvals pvals_adj
VEGFA -35.816616 -10.077840 0.0000000 0.0000000
COL1A1 47.572320 9.932630 0.0000000 0.0000000
PLVAP 38.004112 9.664353 0.0000000 0.0000000
PDK4 -34.856445 -7.034097 0.0000000 0.0000000
COL4A1 35.776917 5.580956 0.0000000 0.0000000
PECAM1 27.437399 4.532059 0.0000000 0.0000000
SERPINE1 20.581670 4.145850 0.0000000 0.0000000
COL5A1 16.131900 4.142918 0.0000000 0.0000000
MMP11 3.607988 3.250464 0.0003086 0.0016072
KIT 3.419221 3.240729 0.0006280 0.0030785
RORC -14.886651 -3.145059 0.0000000 0.0000000
CXCL2 -4.145788 -3.131462 0.0000339 0.0002016
LMNA 28.933674 3.093195 0.0000000 0.0000000
ACTA2 10.422846 3.086447 0.0000000 0.0000000
PDGFRB 11.492411 2.996124 0.0000000 0.0000000
CD248 5.186810 2.958897 0.0000002 0.0000015
NEDD4 -14.398186 -2.812049 0.0000000 0.0000000
IL6R -14.629830 -2.766724 0.0000000 0.0000000
MMRN1 6.903936 2.753270 0.0000000 0.0000000
CD276 8.340302 2.740798 0.0000000 0.0000000
PPARGC1A -4.913524 -2.547583 0.0000009 0.0000060
TGFBR2 23.386492 2.544253 0.0000000 0.0000000
FN1 21.659600 2.516487 0.0000000 0.0000000
TGFB1 14.458306 2.510600 0.0000000 0.0000000
SH2D1B -3.437823 -2.446839 0.0005864 0.0029030
ENG 20.407070 2.388739 0.0000000 0.0000000
ETS1 20.198217 2.334731 0.0000000 0.0000000
CTSW 4.125513 2.294079 0.0000370 0.0002126
CLEC14A 20.905638 2.275434 0.0000000 0.0000000
TP63 -4.689392 -2.272294 0.0000027 0.0000176
E2F1 3.179068 2.174574 0.0014775 0.0067775
WWTR1 19.532927 2.166429 0.0000000 0.0000000
VWF 20.648241 2.120249 0.0000000 0.0000000
ITGA5 19.046722 2.072308 0.0000000 0.0000000
SELP 8.094025 2.045839 0.0000000 0.0000000
CD40 11.831773 2.011925 0.0000000 0.0000000

4.5 Fibroblast

names scores logfoldchanges pvals pvals_adj
COL1A1 99.661520 109.207320 0.0000000 0.0000000
FOS -47.059372 -10.880125 0.0000000 0.0000000
EGR1 -50.041943 -10.419629 0.0000000 0.0000000
FN1 58.055080 8.389499 0.0000000 0.0000000
JUN -63.418068 -7.511326 0.0000000 0.0000000
COL5A1 58.653736 6.514852 0.0000000 0.0000000
PDK4 -36.407936 -6.081756 0.0000000 0.0000000
GPX3 -34.492940 -5.962629 0.0000000 0.0000000
SERPINA1 9.758843 4.440912 0.0000000 0.0000000
CCR2 4.496484 3.799526 0.0000069 0.0000326
RET -3.050877 -3.763918 0.0022817 0.0082672
TNC 28.061033 3.746374 0.0000000 0.0000000
DUSP1 -42.866486 -3.717051 0.0000000 0.0000000
FLI1 4.938783 3.521752 0.0000008 0.0000040
SPRY2 -28.405836 -3.440419 0.0000000 0.0000000
BCL2 -21.041565 -3.361053 0.0000000 0.0000000
IL6R -11.209660 -3.353977 0.0000000 0.0000000
TNF 6.995335 3.302661 0.0000000 0.0000000
FAP 5.171738 3.295502 0.0000002 0.0000012
HLA-DPA1 2.532061 3.215192 0.0113394 0.0363443
KLF2 -22.494427 -3.147081 0.0000000 0.0000000
PREX2 -9.170182 -3.143904 0.0000000 0.0000000
TGFBI 14.967690 3.113704 0.0000000 0.0000000
PLA2G2A -10.403379 -2.908959 0.0000000 0.0000000
CR2 4.180466 2.905692 0.0000291 0.0001359
BMP1 42.859085 2.858506 0.0000000 0.0000000
EPHA2 -5.979615 -2.842888 0.0000000 0.0000000
EGFR -36.323550 -2.792461 0.0000000 0.0000000
TEAD4 4.601498 2.774775 0.0000042 0.0000202
LGR5 -5.319131 -2.760852 0.0000001 0.0000006
MMP11 10.715138 2.731303 0.0000000 0.0000000
PGF -6.036967 -2.727679 0.0000000 0.0000000
SOD2 -21.268118 -2.698692 0.0000000 0.0000000
NFKBIA -37.298576 -2.643214 0.0000000 0.0000000
CXCL1 5.260466 2.607160 0.0000001 0.0000008
S100A9 -3.774349 -2.586396 0.0001604 0.0006741
HGF 3.872809 2.472608 0.0001076 0.0004678
CLCA1 3.283746 2.400243 0.0010244 0.0040014
FZD7 -21.664896 -2.394277 0.0000000 0.0000000
JUNB -20.558294 -2.368409 0.0000000 0.0000000
CDK6 12.404665 2.341559 0.0000000 0.0000000
POU2AF1 10.431594 2.292325 0.0000000 0.0000000
CD1C 2.798475 2.265068 0.0051344 0.0171148
CXCL2 7.335484 2.252299 0.0000000 0.0000000
COL11A1 19.515144 2.240313 0.0000000 0.0000000
MET -2.514364 -2.211613 0.0119247 0.0379769
MMP9 8.479343 2.173744 0.0000000 0.0000000
ZAP70 7.108235 2.166870 0.0000000 0.0000000
COL4A1 -14.409939 -2.116672 0.0000000 0.0000000
MUC1 8.429799 2.110764 0.0000000 0.0000000
PDGFRA -24.080532 -2.103632 0.0000000 0.0000000
LOX 3.660348 2.065853 0.0002519 0.0010239
LGALS9 2.994693 2.004917 0.0027472 0.0097419
IL1R2 -3.184731 -2.000739 0.0014489 0.0054882

4.6 Macrophage

names scores logfoldchanges pvals pvals_adj
COL1A1 29.665201 21.313148 0.0000000 0.0000000
FOS -22.637910 -12.442084 0.0000000 0.0000000
MMP9 5.213645 6.499431 0.0000002 0.0000044
SPP1 4.578217 6.127543 0.0000047 0.0000977
FN1 17.229155 5.819446 0.0000000 0.0000000
EGR1 -17.853518 -4.971041 0.0000000 0.0000000
TREM2 5.923193 4.002033 0.0000000 0.0000001
PDK4 -14.654759 -3.942336 0.0000000 0.0000000
MRC1 -16.989233 -3.646870 0.0000000 0.0000000
ITGAX 13.298241 3.640879 0.0000000 0.0000000
COL5A1 10.114567 3.443203 0.0000000 0.0000000
DUSP1 -13.914675 -2.867081 0.0000000 0.0000000
THBD -8.767480 -2.502315 0.0000000 0.0000000
CD248 3.794988 2.477330 0.0001477 0.0018006
PKM 11.442293 2.380384 0.0000000 0.0000000
HIF1A 5.919156 2.233082 0.0000000 0.0000001
LRP1 12.007602 2.061832 0.0000000 0.0000000

4.7 Mast cell

names scores logfoldchanges pvals pvals_adj
COL1A1 11.554090 30.175860 0.0000000 0.0000000
FN1 3.923991 4.566871 0.0000871 0.0075897
VEGFA -6.816688 -4.331221 0.0000000 0.0000000
COL5A1 3.746050 4.181580 0.0001796 0.0128315
SFRP2 3.621788 4.016486 0.0002926 0.0182859
NFKB2 -6.765123 -3.871854 0.0000000 0.0000000
ICAM1 -4.836095 -2.638979 0.0000013 0.0001655
CSF1 3.913213 2.463667 0.0000911 0.0075897

4.8 Smooth Muscle cell

names scores logfoldchanges pvals pvals_adj
COL1A1 36.033295 35.535330 0.0000000 0.0000000
COL4A1 27.172250 7.101044 0.0000000 0.0000000
FOS -14.032536 -6.647189 0.0000000 0.0000000
JUN -15.592484 -5.803882 0.0000000 0.0000000
FN1 18.829844 5.127449 0.0000000 0.0000000
TNF 4.487255 4.998678 0.0000072 0.0000863
COL5A1 21.033014 4.751438 0.0000000 0.0000000
ACTA2 -14.657331 -4.576652 0.0000000 0.0000000
JUNB -17.292665 -4.422530 0.0000000 0.0000000
LGR6 -5.308680 -4.066671 0.0000001 0.0000017
MYH11 -15.415986 -4.019932 0.0000000 0.0000000
PDGFRB 11.228720 3.559115 0.0000000 0.0000000
CHEK2 2.926007 3.364132 0.0034334 0.0264110
PDK4 -9.583596 -3.270993 0.0000000 0.0000000
PTGDR2 -2.747317 -3.097388 0.0060085 0.0406734
DUSP1 -14.434421 -3.063678 0.0000000 0.0000000
SERPINE1 9.548772 3.017724 0.0000000 0.0000000
E2F1 3.277156 2.942244 0.0010486 0.0095326
ATF3 -11.208811 -2.790443 0.0000000 0.0000000
PLVAP 6.595327 2.787905 0.0000000 0.0000000
MMP11 7.006973 2.598386 0.0000000 0.0000000
BMP1 14.924097 2.452774 0.0000000 0.0000000
CX3CL1 -3.627839 -2.451389 0.0002858 0.0029164
CD248 13.306128 2.423763 0.0000000 0.0000000
ICAM3 4.481207 2.389812 0.0000074 0.0000863
DES -6.431553 -2.373101 0.0000000 0.0000000
SFRP2 5.309166 2.353069 0.0000001 0.0000017
MYBL2 2.944799 2.263086 0.0032316 0.0256479
CD276 6.821728 2.071905 0.0000000 0.0000000

4.9 T cell

names scores logfoldchanges pvals pvals_adj
COL1A1 18.528063 34.526530 0.0000000 0.0000000
FN1 8.459756 5.744955 0.0000000 0.0000000
CD40LG -5.212253 -2.767354 0.0000002 0.0000311
FOS -3.901277 -2.126768 0.0000957 0.0053159
COL5A1 4.896749 2.073000 0.0000010 0.0000974
CTSW 3.538528 2.022379 0.0004024 0.0167652

5. Supplements

For original pictures and more details, please use the google drive link: ENT MERFISH Supplement